You can use any dataset for this task accessible by the pandas-datareader module. You should explain what the dataset is about and which part of it will you visualize!
If the pandas-datareader package is not available in your notebook, then either stop, delete and restart your course container on the Kooplex hub page, or install into your userspace
pip install --user pandas-datareader
When you finished with the notebook, then
jupyter-nbconvert --execute worksheet-interactive.ipynbYou can find further interactive tools on the pyviz site:
This report presents a few intarective plots featuring miscellaneous, mainly monetary, economic and enviromental data accessed via pandas datareader. The content of different section is usually not strongly related.
#pip install --user pandas-datareader
%pylab inline
from pandas_datareader import wb
from bokeh.resources import INLINE
import bokeh.io
from bokeh import *
bokeh.io.output_notebook(INLINE)
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
Firstly, we examine monetary measures (GDP per capita and PPP) of V4 (Visegrád Four) countries (Hungary, Poland, Slovakia and Czech Republic). Below a list of related data sources is shown.
matches = wb.search('gdp.*capita.*const')
matches
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.layouts import row
from bokeh.models import Div, RangeSlider, Spinner, Legend, LabelSet, FreehandDrawTool
from holoviews.plotting.links import DataLink
countries = ['HUN', 'POL', 'SVK', 'CZE']
colors = ["green", "red", 'blue', 'indigo', 'cornflowerblue']
TOOLTIPS = [ ("GDP per capita", "$y{0.1f}"), ("year", "$x{F}") ]
p = figure(width=450, height=350, x_axis_label="Year",
y_axis_label="GDP per capita [2015 US$]", tooltips=TOOLTIPS)
r = p.multi_line(line_width=5, alpha=0.4, color='black')
draw_tool = FreehandDrawTool(renderers=[r])
p.add_tools(draw_tool)
for i in range(len(countries)):
c = countries[i]
cl = colors[i]
dat = wb.download(indicator='NY.GDP.PCAP.KD', country=[c], start=2000, end=2020)
dat.rename(columns={'NY.GDP.PCAP.KD': 'GDP'}, inplace=True)
#source = ColumnDataSource(data=dict(x=np.arange(2000,2021), y=list(dat['GDP'][::-1])))
p.line(x=np.arange(2000,2021), y=dat['GDP'][::-1], line_color=cl, line_width=3, legend_label=c)
p.legend.location = "top_left"
p.legend.click_policy="mute"
TOOLTIPS = [ ("PPP", "$y{0.1f}"), ("year", "$x{F}") ]
p2 = figure(width=450, height=350, x_axis_label="Year",
y_axis_label="PPP [2017 international dollars]", tooltips=TOOLTIPS)
r = p2.multi_line(line_width=5, alpha=0.4, color='black')
draw_tool = FreehandDrawTool(renderers=[r])
p2.add_tools(draw_tool)
for i in range(len(countries)):
c = countries[i]
cl = colors[i]
dat = wb.download(indicator='NY.GDP.PCAP.PP.KD', country=[c], start=2000, end=2020)
dat.rename(columns={'NY.GDP.PCAP.PP.KD': 'PPP'}, inplace=True)
p2.line(x=np.arange(2000,2021), y=dat['PPP'][::-1], line_color=cl, line_width=3, legend_label=c)
p2.legend.location = "top_left"
p2.legend.click_policy="mute"
show(row(p, p2))
The plot above shows the GDP per capita (left) and PPP (right) data for the V4 countries in the time interval 2000-2020. GDP per capita and PPP are measured in 2015 US dollar and 2017 international dollar, respectively. The more detailed description of these two features (provided by the datareader) is written below.
print(matches['sourceNote'][10389])
print(matches['sourceNote'][10393])
The two plots above represent well the effect of both the Great Recession between 2007 and 2009 and the COVID-19 pandemic starting at late 2019. Both cases affected negatively both the GDP per capita and PPP. Also one may notice while the pandemic clearly had an impact on all four countries, the Great Recession seemed to spare Poland. Furthermore, by checking the following map on real GDP growth rate in 2009, one may conclude that a fair proportion of countries were not significantly affected (including Poland) mainly in Africa and the Asia-Pacific. (Note: this figure is not self-made, check source below the figure!)
Source: https://en.wikipedia.org/wiki/Great_Recession
With the interactive map below you can cruise through a few countries to visually check their economic growth (in terms of GDP per capita and PPP) during these two global events, and in gerenal in this century. Apparently the huge economic grow of China was never reserved in the studied period, however, the impact of the COVID pandemic does indeed manifest in a decreased growth rate both in GDP per capita and PPP.
cols = ['GDP', 'PPP']
countries = ['HUN', 'POL', 'USA', 'DEU', 'CHN', 'BWA', 'IND', 'UKR', 'GBR', 'BRA', 'AUS', 'JPN']
cs = {}
for country in countries:
cs[country] = wb.download(indicator=['NY.GDP.PCAP.KD', 'NY.GDP.PCAP.PP.KD'], country=[country],
start=2000, end=2020)
cs[country].rename(columns={'NY.GDP.PCAP.KD': 'GDP', 'NY.GDP.PCAP.PP.KD': 'PPP'}, inplace=True)
curve_dict = {(c2, c): hv.Curve((np.arange(2000,2021), cs[c2][c][::-1]), kdims=['year'], vdims=[c]).opts(
width=650, height=450, line_width=3,
xlabel="year", ylabel="measure [dollar]", tools=['hover'])
for c2 in countries for c in cols}
hmap = hv.HoloMap(curve_dict, kdims=['country', 'measure'])
hmap
In this section we study the correlation of PPP with other technological and social quantities. Firstly, we compare the PPP to the mobile cellular network coverage. To this and, global and african data sources are provided (see the list below), however, the former did not work for us. Based on the UserWarning, this source was probably deleted or archived, hence, the data restricted to Africa was used.
matches = wb.search('cell.*%')
matches
The description of the data (see below) emphasizes that these numbers do not correspond to the ratio of actual mobile users. The data measures the ratio of the population with theoretical ability to use mobile cellular services if one has a cellular telephone and a subscription.
print(matches['sourceNote'][8000])
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.models import Div, RangeSlider, Spinner, Legend, LabelSet, FreehandDrawTool
ind = ['NY.GDP.PCAP.PP.KD', 'IT.MOB.COV.ZS']
cell = wb.download(indicator=ind, country='all', start=2011, end=2011).groupby("country").sum()
cell = cell.loc[~(cell==0).any(axis=1)]
cell.rename(columns={'NY.GDP.PCAP.PP.KD': 'PPP', 'IT.MOB.COV.ZS': 'cell phone'}, inplace=True)
source = ColumnDataSource(data=dict(x=cell['PPP'], y=cell['cell phone'], country=list(cell.index)))
TOOLTIPS = [ ("Country", "@country"), ("Mobile network [%]", "$y{0.1f}"), ("PPP", "$x{0.1f}") ]
p = figure(width=700, height=500, x_axis_label="PPP [2017 international dollars]",
y_axis_label="Population covered by mobile cellular network [%]", tooltips=TOOLTIPS, title='year: 2011')
r = p.multi_line(line_width=5, alpha=0.4, color='black')
draw_tool = FreehandDrawTool(renderers=[r])
p.add_tools(draw_tool)
p.circle(source=source, line_color="green", line_width=3, size=15, fill_alpha=0.3)
show(p)
The plot above shows the PPP (measured in 2017 international dollars) and the population ratio covered by mobile cellular network for African countries in 2011. There is a quite good correlation of 0.43 between the two quantities (computed below). Here, we can see, that the best network covarge (not so surprisingly) is in the isle-states in the Indian Ocean (Seychelles and Mauritius) where the population is concentrated quite well in space and in the great economic growth success story: Botswana often referred to as the Switzerland of Africa.
from scipy.stats import pearsonr
pearsonr(cell['PPP'], cell['cell phone'])[0]
matches=wb.search('energy')
matches.head()
Secondly, we study the share $R$ of renewable energy of states and its correlation with PPP. See a piece of data sources in this topic above and the description of the used data below.
print(matches['sourceNote'][166])
renew = {}
for year in range(2000, 2016):
ind = ['2.1_SHARE.TOTAL.RE.IN.TFEC', 'NY.GDP.PCAP.PP.KD']
renew[year] = wb.download(indicator=ind, country='all', start=year, end=year).groupby("country").sum()
renew[year] = renew[year].loc[~(renew[year]==0).any(axis=1)]
renew[year].rename(columns={'2.1_SHARE.TOTAL.RE.IN.TFEC': 'renewable energy', 'NY.GDP.PCAP.PP.KD': 'GDP'}, inplace=True)
renew[year]['country'] = renew[year].index
years = np.arange(2000, 2016)
curve_dict = {year: hv.Scatter(data=renew[year], kdims=['GDP'], vdims=['renewable energy', 'country']).opts(
width=650, height=450, line_color="blue", line_width=3, size=15, fill_alpha=0.3, tools=['hover'],
xlabel="PPP [2017 international dollars]", ylabel="R [%]")
for year in years}
hmap = hv.HoloMap(curve_dict, kdims=['year'])
hmap
The scatter plot above presents the PPP (in 2017 international dollars) and the share $R$ of renewable energy. The plot implies anticorrelation of the two quantities. If we look at the latest state (of 2015) we can observe some tendencies. Such as
As it is shown below, the PPP and $R$ are anticorrelated with a Pearson correlation of roughly -0.5. However, in the more recent years of the examined time period, the anticorrelation is slightly weaker for a for us unknown reason.
from scipy.stats import pearsonr
corr = {}
for y in range(2000, 2016):
corr[y] = pearsonr(renew[y]['GDP'], renew[y]['renewable energy'])[0]
corr.keys()
TOOLTIPS = [ ("PPP-R energy correlation", "$y{0.2f}"), ("year", "$x{F}") ]
p = figure(width=650, height=350, x_axis_label="Year",
y_axis_label="PPP-renewable energy correlation", tooltips=TOOLTIPS, y_range=(-0.55, -0.4))
p.line(x=list(corr.keys()), y=list(corr.values()), line_color='blue', line_width=3)
show(p)
We also compared the urbanization of the population with the PPP. The description two cells below corresponds to the data used. Above that there is a short list of similar data sources. Let us denote the ratio of urban population with $U$ where urban population refers to people living in urban areas as defined by national statistical offices.
matches2 = wb.search('total population')
matches2.tail()
print(matches2.loc[13936,'sourceNote'])
renew = {}
year=2015
ind = ['2.1_SHARE.TOTAL.RE.IN.TFEC', 'NY.GDP.PCAP.PP.KD', 'SP.URB.TOTL.IN.ZS']
renew[year] = wb.download(indicator=ind, country='all', start=year, end=year).groupby("country").sum()
renew[year] = renew[year].loc[~(renew[year]==0).any(axis=1)]
renew[year].rename(columns={'2.1_SHARE.TOTAL.RE.IN.TFEC': 'renewable energy', 'NY.GDP.PCAP.PP.KD': 'PPP', 'SP.URB.TOTL.IN.ZS': 'urban'}, inplace=True)
renew[year]['country'] = renew[year].index
# create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=renew[2015]['PPP'], y=renew[2015]['renewable energy'],
z=renew[2015]['urban'], country=list(renew[2015].index)))
TOOLS = "pan,box_zoom,wheel_zoom,box_select,lasso_select"
TOOLTIPS = [ ("Country", "@country"), ("renewable energy [%]", " $y{0.1f}"), ("PPP [2017 int. $]", "$x{F}") ]
TOOLTIPS2 = [ ("Country", "@country"), ("urban population [%]", "$y{0.1f}"), ("PPP [2017 int. $]", "$x{F}") ]
p1 = figure(width=450, height=350, tools=TOOLS, tooltips=TOOLTIPS,
x_axis_label="PPP [2017 international dollar]", y_axis_label="R [%]", title="year: 2015")
p1.scatter('x', 'y', source=source, fill_color='blue')
p2 = figure(width=450, height=350, tools=TOOLS, tooltips=TOOLTIPS2,
x_axis_label="PPP [2017 international dollar]", y_axis_label="U [%]", title="year: 2015")
p2.scatter('x', 'z', source=source, fill_color='blue')
show(row(p2, p1))
In the linked figures above both $R$ and $U$ are shown as the function of PPP (in 2017 internation dollar) in 2015. Here, with the selecter tools one can filter data for, e.g., a certain range of $U$ or $R$. This way one can discover that, e.g., the share of renewable energy is very low (typically <10%) in very highly (>90%) urbanized states. Below the correlation PPP-$U$ and PPP-$R$ are both computed for that year. It was already shown above the $R$ and the PPP anticorrelate. $U$ and PPP correlate with a quite high positive Pearson coefficient of 0.65. This is not surprising as one may expect more urbanized countries to be wealthier and more rural states in the 3rd world to be poorer.
from scipy.stats import pearsonr
print("C(PPP, U) = %1.2f" % pearsonr(renew[2015]['PPP'], renew[2015]['urban'])[0] )
print("C(PPP, R) = %1.2f" % pearsonr(renew[2015]['PPP'], renew[2015]['renewable energy'])[0] )
Lastly, we compare the CO$_2$ emission of the G7 countries (USA, Canada, Germany, France, UK, Italy and Japan). As it can be seen below there is data available about CO$_2$ emission by economic sector provided by the Country Climate and Development Report (CCDR).
t = wb.search('co2')
t.head()
We chose to consider the data corresponding to the total CO$_2$ emission (see original description printed below). The numbers also include the contribution of land-use change and forestry (LUCF) and is measured in Mt CO$_2$ equivalent. 1 Mt CO$_2$ equivalent is the amount of greenhouse gas that has the global-warming potential equivalent of 1 Mt CO$_2$.
print(t.loc[2026,'name'])
co2 = wb.download(indicator='CC.CO2.EMSE.IL', country=['CAN', 'FRA', 'DEU', 'ITA', 'JPN', 'GBR', 'USA'], start=2015, end=2015)
from bokeh.plotting import figure, show, ColumnDataSource
from bokeh.palettes import Category10, Blues
from bokeh.transform import cumsum
import pandas as pd
x = {}
for i in co2.index:
x[i[0]] = co2.loc[i, 'CC.CO2.EMSE.IL']
data = pd.Series(x).reset_index(name='value').rename(columns={'index': 'country'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Blues[len(x)][::-1]
p = figure(height=350, title="CO2 emission of G7 countries in 2015 [Mt CO2 equivalent]",
tools="hover", toolbar_location=None, tooltips="@country: @value{0.1f}", x_range=(-0.5, 1.0))
p.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),
line_color="white", fill_color='color', legend_field='country', source=data)
p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
show(p)
The pie chart above presents the CO$_2$ emission of G7 countries. The largest contibutor is, not surprisingly, the USA. It is also clearly visible that the North American contribution is significantly larger (about 2/3 of the total) while the total population of the USA and Canada is approx. 370 million people and the other five states have a total population of approx. 400 million.
It is even more remarkable, and much more surprising for us, that Canada has the highest specific (per capita) emission, not the USA (see figure below). Further analysis of emission sector-by-sector could reveal the reason for that, however, this is out of the scope of this report. (The source of population data in 2015: https://en.wikipedia.org/wiki/List_of_countries_by_population_in_2015)
pop = {
'Canada': 34108752,
'Germany': 78728000,
'France': 64395345,
'United Kingdom': 64715810,
'Italy': 60340328,
'Japan': 126573481,
'United States': 321418820
}
co2_spec = []
for i in co2.index:
co2_spec += [ co2.loc[i, 'CC.CO2.EMSE.IL'] / pop[i[0]] ]
co2['specific'] = np.array(co2_spec) * 1e6
co2.sort_values(by=['specific'], ascending=False, inplace=True)
countries = []
for (c, y) in co2.index:
countries += [c]
p = figure(height=350, title="specific CO2 emission of G7 countries in 2015",
y_axis_label="CO2 emission [t CO2 equivalent per capita]",
tools="pan", x_range=countries)
p.vbar(x=countries, top=co2.loc[:,'specific'], width=0.5, fill_color='navy', line_color='navy')
show(p)
In this report we have shown a collage of monetary, economic and enviromental data accessed via pandas datareader in interactive plots. In the first section the negative effect of the Great Recession (2008) and the COVID-19 pandemic is presented in term of GDP per capita and PPP. It was discovared that the impact of the Great Recession was not severe in a fair amount of (mainly developing) countries. In the second section it was shown that PPP correlates strongly with the extent of urbanization and anticorrelate quite well with the share of renewable energy. Also, it was revealed that the cellular mobile network coverage also correlated with the PPP in African countries and it is the highest in the isle-states in the Indian Ocean. In the last section the CO$_2$ emission of G7 countries was studied. We revealed, that North America is overrepresented in CO$_2$ emission and (more surprisingly) the specific (per capita) emission is the highest in Canada.